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Signal  processing  computational  needs 


Jeffrey  M.  Speiser 
Naval  Ocean  Systems  Center 
San  Diego,  CA  92152 


Abstract 


Previous  reviews  of  signal  processing  computational  needs  and  their  systolic  implementation  have  emphasized  the  need  or  a  unail  set 
of  matrix  operations,  primarily  matnx  multiplication,  orthogonal  triangularization,  tn angular  baclcsolve.  angular  value  decomposition,  and 
the  generalized  singular  value  decomposition.  Algorithms  and  architectures  for  these  taslcs  are  sufficiently  well  understood  to  oegin  transi¬ 
tioning  from  research  to  exploratory  development.  Suostantial  progress  has  also  been  reported  on  parallel  algorithms  tor  updating  sym¬ 
metric  eigensystems  and  the  singular  value  decomposition.  Another  problem  which  has  proved  to  be  easier  than  expected  is  mner  -roquet 
computation  for  high-speed  high  resolution  predictive  analog-to-digital  conversion.  Although  inner  product  computation  in  j  general 
setting  will  require  Oflog  n)  ume  via  a  tree,  the  special  structure  of  the  prediction  problem  permits  the  use  of  a  systouc  transversal  liter 
producing  a  new  predicted  value  in  time  0(  1 ). 


Problem  areas  which  are  still  in  an  early  stage  of  study  include  parallel  algorithms  for  the  Wigner-Vdle  Distribution  tumtion 
approximation,  inequality  constrained  least  squares,  and  the  total  least  squares  problem. 


Introduction 


Representative  areas  of  modem  signal  processing  which  have  substantial  computational  requirements  for  real-time  implementation 
include  beamforming,  direction  finding,  spectrum  analysis,  and  image  processing.  An  important  area  which  has  received  less  mention  is 
intelligent  anaiog-to-digitai  conversion. 


Beamforming  and  direction  finding 


Cassicai  frequency-domain  beamforming  is  the  spatial  equivalent  of  a  matched  filter.  To  form  a  beam  in  one  direction  at  one 
frequency,  t  a  necessary  to  form  the  inner  product  of  the  vector  of  complex  amplitudes  at  the  sensor  array  elements  with  a  steering  vector 
for  the  specified  look  direction.  1  If  the  processing  bandwidth  is  small  compared  to  the  center  frequency,  then  the  number  ot  operations 
i  complex  multiply-idds)  required  per  second  to  realize  the  beamforming  by  matrix-vector  multiplication  will  be  B*W*E.  where  B  ;s  the 
numoer  of  beams  (more  generally  the  number  of  cells  in  direction/focus  depth),  W  is  the  bandwidth  in  Hertz,  and  E  is  the  number  of 
elements  in  the  array.  For  a  limited  number  of  special  array  geometries  and  corresponding  special  choices  of  look  directions,  it  a  possible 
:o  reduce  the  number  of  operations  by  using  spatial  convolutions  or  discrete  Fourier  transforms  iDFTs)  realized  via  the  Fast  rouner 
Transform  ■  FFT).  ‘  However,  general  matrix  multiplication  will  be  needed  for  randomly  time-varying  array  geometries  as  weil  as  :or  arrays 
suoject  to  flexure  or  required  to  conform  to  special  surface  shapes. 


Several  interference  cancellation  techniques  require  the  solution  of  linear  least  squares  problems:  An  adaptive  combiner  using  pre¬ 
formed  beams  requires  the  solution  of  an  unconstrained  linear  least  squares  problem.  Minimum  variance  distortionless  response  i  MVDR) 
beamforming  requires  the  solution  of  a  least  squares  problem  with  one  linear  constraint.— 4  More  recent  versions  use  multiple  linear  con¬ 
straints  to  avoid  the  formation  of  deep  nulls  in  directions  too  close  to  the  look  directions.4^  .Although  least  squares  adaptive  beamformers 
have  frequently  been  implemented  using  gradient  descent  and  similar  iterative  methods,  such  methods  can  suffer  from  slow  convergence 
■when  the  interference  is  strong  compared  to  the  signal  and  noise,  arid  the  data  covariance  matnx  therefore  has  a  large  condition  number. 
Since  the  cost  of  computation  is  continually  decreasing  relative  to  the  cost  of  aperture  in  space  or  time,  it  is  desirable  to  avoid  statistical 
iteration,  and  rather  to  provide  at  each  time  the  best  '.east  squares  solution  possible  'with  the  available  data.  Although  the  signal  processing 
textbooks  describe  such  a  solution  in  terms  of  direct  inversion  of  the  sample  covanance  matrix.  -  it  is  numerically  far  preferable  to  solve  the 
least  squares  problem  directly  with  the  data  matrix,  using  either  the  singular  value  decomposition  (SVD)  or  orthogonal  triangularization 
techniques  such  as  Givens  rotations.  Householder  reflections,  or  the  modified  Gram-Schmidt  method. 6  Some  authors  have  proposed  par¬ 
allel  architectures  based  on  hyperbolic  rotations  or  a  hyperbolic  version  of  the  Householder  transformation.  Such  transformations  are  not 
unitary,  and  are  not  guaranteed  to  have  the  good  numerical  stability  of  true  unitary  transformations.  For  MVDR  implemented  via 
orthogonal-tnangulanzation,  with  a  number  of  look  directions  greater  than  the  number  of  elements  in  the  array,  and  with  the  adaptive 
weights  updated  for  each  data  sample,  the  number  of  arithmetic  operations  required  per  second  will  be  appproximately  B*'V*E*E.  or 
larger  than  the  requirement  for  classical  beamforming  by  about  a  factor  of  E,  the  number  of  elements  in  the  array. 


While  adaptive  interference  cancellation  techniques  incorporate  a-priori  information  in  the  form  of  the  assumption  ot  point  sources  ot 
interference,  eigenvector  and  eigenvalue-eigenvector  based  direction  finding  methods  use  two  covariance  matrices:  prior  knowledge  (perhaps 
in  the  form  of  a  model)  of  the  noise  spatial  covanance  structure,  and  measurement  of  the  total  signal-pius-noise  spatial  covanance  matrix.  A 
survey  of  such  methods  through  mid  1985  j  available.  Although  a  quantitative  theory  of  the  performance  of  such  methods  remains 
difficult,  many  stimulation  studies  and  a  few  experimental  ones  have  shown  that  when  the  pr.or  information  is  correct,  eigenvector  based 
direction  finding  can  provide  resolution  many  times  .finer  than  the  Rayleigh  limit.  A  representative  eigenvector  based  direction  method 
is  the  MUSIC  algorithm  of  R.  Schmidt.'^"1 1  MUSIC  requires  the  following  sequence  of  calculations:  a)  Solution  of  a  generalized  eigen¬ 
value  problem,  b)  Estimating  the  number  of  sources  from  the  multiplicity  of  the  smallest  generalized  eigenvalue.  (With  finite  sample  size, 
this  will  actually  be  a  cluster  of  small  generalized  eigenvalues!,  ci  Determining  the  "noise  subspace”  -  computing  an  orthonormal  basis  for 
the  generalized  eigenvectors  corresponding  to  the  smallest  (or  small)  generalized  eigenvalues,  d)  Forming  the  so-called  "‘direcnon-of-amvai 
spectrum,”  or  equivalently  searching  for  look  directions  whose  steering  vectors  are  orthogonal  or  nearly  orthogonal  to  the  noise  subspace. 
ei  Estimation  of  the  source  covanance  matnx  to  determine  rource  powers  and  to  associate  multiple  arrivals  trom  a  single  source. 
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In  oitler  to  accurately  compute  small  generalized  eigenvalues  and  corresponding  generalized  eigenvectors,  it  is  preferable  to  solve  the 
generalized  eigenvalue  problem  via  the  generalized  singular  value  decomposition  (GSVD)  of  Van  Loan.  I -.13 

For  most  eigenvector-based  direction  finding  techniques,  the  most  computationally  burdensome  step  will  be  the  computation  of  the 
DOA  spectrum.  Although  this  requires  only  matrix-vector  multiplications,  it  requires  a  very  large  number  of  them  -  unless  fast  search 
techniques  are  developed,  it  will  require  one  matrix-vector  prodticr  for  each  resolvable  point  m  the  array  manifold.  For  a  two-dimensional 
antenna  array  with  elements  sensing  diverse  polarizations,  the  required  number  of  matrix -vector  multiplications  per  frequency  bin  update  is 
the  product  of  the  number  of  resolvable  two-dimensional  angles  tunes  the  numoer  of  resolved  polarizations.  One  recent  eigenvector  based 
direction  tinding  method,  the  ESPRIT  technique  of  Paulraj  and  Kuilarh. '•  4  avoids  the  computationally  expensive  DOA  spectrum  calcula¬ 
tion,  but  is  only  applicable  for  very  special  array  geometries  and  noise  fields. 

Spectrum  analysis 

Most  beamforming/direction  finding  techniques  nave  direct  analogs  for  the  problem  of  spectrum  analysis,'  -  due  to  the  similarity  of 
beamforming  for  a  uniformly  spaced  line  array  and  spectrum  analysis  for  a  stationary  random  sequence.  An  extensive  survey  of  modem 
spectrum  estimation  techniques  is  available.'  -a  Qne  class  of  methods  that  should  receive  special  attention  from  a  computation  point  of 
new  is  the  linear  predictive  techniques.  16,15  Frequently,  these  methods  are  implemented  using  fast  Toeplitz  equation  solvers,  such  as  the 
Levinson-Trench  method  or  Durbin’s  method  to  solve  the  prediction  problem.  Although  such  methods  save  arithmetic  operations,  they  can 
produce  poor  numerical  results  when  the  covariance  matrix  is  ill-conditioned.  Such  ill  conditioning  will  occur,  when  there  are  strong  spec¬ 
tral  lines  -  exactly  the  case  where  the  random  process  has  an  all  pole  spectrum  as  assumed  by  linear  predictive  methods.  Improvement  of 
linear  predictive  methods  by  using  the  SVD  to  solve  the  prediction  equations  has  been  reported.  '^.18 

Dense  matrix  computation  needs  for  beam  forming,  direction  finding,  and  spectrum  estimation 

Beamforming,  direction  finding,  and  spectrum  analysis  lead  to  requirements  for  a  fairly  complete  set  oflinear  algebra  operations  for 
dense  matrices:  In  principle  one  needs  matrix -vector  multiplication,  matrix  multiplication,  linear  equation  solution,  matnx  inversion,  least 
squares  solution,  orthogonal  tnangularization,  solution  of  Hermitian  symmetric  eigensystems.  singular  value  decomposition,  and  the  gener¬ 
alized  singular  value  decomposition.  The  above  list  can  be  reduced  to  a)  matrix-vector  multiplication,  b)  orthogonal  triangular  decompos¬ 
ition.  cl  the  SVD.  d)  the  generalized  SVD. 

Matrix  computations  in  image  processing 

The  above  list  of  matnx  operations  also  suffices  for  many  types  of  image  processing.  The  SVD  :s  especially  suitable  for  imagery  with 
unknown  or  time-vanant  statistics,  since  it  generates  a  best  reduced  rank  approximation  to  a  data  block,  thus  constructing  i  transform  basis 
matched  to  the  current  data  block. '  8a  ^  eigenvector  based  image  registration  technique  has  been  reported  whicn  uses  signal  subspace 
methods  quite  similar  to  the  MUSIC  algorithm.  1^-31 

Image  restoration  problems  wiil  require  a  wider  range  of  minimization  techniques  than  unconstrained  and  linearly  constrained  least 
squares  solution.  Promising  methods  include  least  squares  (possibly  damped  or  regularized)  with  nonnegativity  or  other  nonlinear  con¬ 
straints*-*-^  and  LI  norm  minimization. The  use  of  LI  norm  minimization  has  also  been  described  for  deconvolution  with  unproved 
tolerance  of  noise  bursts-^  and  for  spectrum  estimation  with  improved  tolerance  of  outliers.-8 


Predictive  analog-to-digital  conversion 

Techniques  have  been  proposed  for  improving  the  dynamic  range  and  accuracy  of  analog  to  digital  converters  (A/Ds)  by  incorporating 
a  linear  prediction  using  coarsely  quantized  past  samples.30  To  implement  such  convertors  with  high  sample  rates  it  is  necessary  to  mini¬ 
mize  the  computational  latency  in  the  predictor,  since  the  allowable  1  atency  will  be  Jess  than  the  sampling  interval.  Predictors  using  weights 
which  are  independent  of  the  data  statistics  (other  than  bandwidth)  have  been  described  by  Brown^ '  and  Splettstosser.8-.33  Such  methods 
require  a  digital  inner  product  computation  with  a  fixed  weight  vector  for  each  new  data  sample.  The  multiplications  needed  for  such  an 
isolated  inner  computation  can  be  performed  in  parallel,  so  only  one  multiplication  is  needed.  However,  if  the  summation  is  performed  by 
a  binary  tree  of  adders,  logm  N)  addition  times  are  required,  so  a  lower  latency  method  for  inner  product  computation  or  summation  would 
be  useful,  even  at  the  cost  of  decreased  efficiency.  Since  the  predictor  output  may  be  viewed  as  the  result  of  convolving  the  previously 
quantized  values  with  a  fixed  set  of  filter  coefficients,  the  latency  may  be  reduced  to  approximately  one  multiply/add  time  by  using  a 
systolic  convolver.  Several  suitable  designs  have  been  discussed  by  H.T.  Kung.89 

Parallel  matrix  algorithms  and  architectures:  non-iterative 

Highly  efficient  multiplication  of  matrices  may  be  performed  using  a  rectangular  systolic  array  in  the  “engagement  processor" 
mod e.3-*.35  ,\n  engagement  processor  provided  with  multiple  planes  of  memory  can  also  efficiently  perform  the  oartitioned  muitioiicction 
of  large  matrices.^4-1-  or  the  partitioned  inversion  of  strongly  nonsinguiar  matrices.  Gentleman  and  Kung  have  described  an  efficient  tri¬ 
angular  systolic  architecture  for  orthogonal  reduction  of  matrices  to  t-angular  form  oa  Givens’  rotations  and  its  ise  for  least  squares  solu¬ 
tion  when  combined  with  a  linear  systolic  array  for  triangular  backsubsiitution.-'^  Tims  architecture  can  also  efficiently  update  suen  a 
reduction  when  an  additional  row  is  added,  or  indeed  can  continuously  update  'with  new  rows  corresponding  to  new  data.  Its  application 
to  constrained  least  squares  solution  for  MVDR  beamforming  has  been  described. 8 '  Including  efficient  implementation  of  complex  arith¬ 
metic  and  full  triangular  array  emulation  by  a  subarray.  A  variant  of  the  triangular^svstolic  array  has  been  described  which  permits  direct 
formation  of  the  MVDR  output  without  requiring  a  separate  backsoive  array. 88.39  However  it  appears  difficult  to  efficiently  apply  this 
array  to  problems  requiring  the  formation  of  multiple  simultaneous  beams.  A  difficulty  with  the  solution  of  least  squares  problems  by 
present  systolic  arrays  for  orthogonal  triangulanzaiion  is  that  they  are  unable  to  provide  for  column  pivoting.  Therefore,  if  the  data  matnx 
has  less  than  full  rank,  the  tnangulanzed  matrix  can  be  singular,  ind  the  triangular  backsoive  will  break  down.  In  the  case  of  near  rank 
deficiency,  the  tnangulanzed  matrix  wiil  be  close  to  singular,  and  the  backsoive  may  exhibit  poor  numencaJ  performance.  While  more 
complicated  architectures  may  provide  pivoting  for  matnees  of  fixed  sue.  many  least  squares  signal  processing  applications  will  require 
continuous  updating.  Harper  Whitehouse  has  suggested  that  the  tnangulanzed  problem  be  solved  via  the  SVD.4*8  This  avoids  the  need  tor 


pivoting  in  the  orthogonal  cnanguianzation,  and  permits  solution  of  the  least  squares  prooiem  via  the  pseudoinverse  or  the  tnanguianzed 
matrix.  Regularization  may  easily  be  provided.  Orthogonal  tnangulanzation  via  the  Gentleman-Rung  and  umiiar  arrays  requires  the  comp¬ 
utation  of  a  Givens  rotation  before  the  rotation  coefficients  can  be  propagated  and  applied  to  a  pair  of  rows  of  the  matrix.  Application  of 
l  rotation  to  a  pair  of  elements  requires  only  multiplications  and  additions,  but  computation  of  the  rotation  requires  square  roots,  and  can 
Limit  the  speed  ot  the  processor.  Several  approaches  to  this  prooiem  have  been  proposed:  a)  implementing  the  square  root  cell  using  a 
taster  technology  than  is  used  for  the  interior  cell,  b)  use  of  on-line  arithmetic*'  to  speed  the  computation  of  square  roots,  c)  use  of  scaled 
Givens  rotations,-*'1-''*-3  also  known  as  '"fast  Givens"  or  "square  root  free  Givens."  The  last  approach  requires  special  care  to  select  a  stable 
version  of  the  algorithm,  to  avoid  underflow,  and  also  leads  to  a  significantly  more  complicated  control  structure  for  the  array. 

Parallel  matrix  algorithms  and  architectures:  iterative 

For  matrices  larger  than  u  <  4,  the  computation  of  etgensystems,  the  SVD.  and  their  generalization  is  necessarily  iterative.  There  are 
three  generally  ipplicaole  classes  of  numerically  stable  methods  for  the  dense  symmetric  eigensystem  and  the  SVD  for  which  parallel 
implementations  have  been  proposed:  D  Jacooi  methods,-**  ^  b)  QR  methods,3®-^*  and  c)  the  recently  introduced  tearing  methods. 55-56 
The  Jacobi  methods  are  tar  superior  in  terms  of  simplicity,  regularity,  ind  need  for  only  iocai  communication  and  control.  The  QR  methods 
'QR  or  QL  eigensystem  and  Golub-Reinsch  SVD)  are  significantly  faster  on  a  uniprocessor,  but  introduce  several  new  problems:  a)  The 
need  :or  a  preliminary  reduction  to  tndiagonai  form  for  the  eigensystem  problem  or  to  bidiagonal  form  for  the  SVD.  b)  greatly  increased 
communication  and  control  complexity,  c)  the  possibility  of  required  decomposition  of  the  problem  into  subprobiems.  possibly  of  different 
sues,  when  implicit  slutts  are  used.  The  tearing  methods  are  in  an  early  stage  of  study,  but  may  be  the  most  efficient  of  all.  and  may  be 
used  n  combination  with  the  other  two  methods. 

The  difficult  problem  ot  systolic  computation  of  the  GSVD  by  a  fully  numerically  stable  method  has  now  been  solved. - 

While  nearly  optimally  efficient  systolic  architectures  are  mown  for  the  dense,  non-iterative  matrix  computations,  that  stage  of  devel¬ 
opment  has  not  yet  been  reached  for  the  eigensystem  problem  and  the  SVD.  Other  areas  of  current  research  include  efficient  communica¬ 
tion  between  systolic  subarravs  and  higher  level  language  support  for  parallel  processors. 

Updating  eigensystem  solutions  and  the  SVD 

In  many  signal  processing  applications,  data  is  received  essentially  continuously,  and  it  is  necessary  to  update  previously  computed 
eigensystem  decompositions  or  singular  value  decompositions.  Such  problems  frequently  occur  in  beamforming,  direction  finding,  and 
spectrum  estimations.  .Although  efficient  updating  tecnmques  have  long  been  known  for  orthogonal-triangular  factorization,  the  more 
difficult  problem  of  updating  eigensystems  has  only  recently  been  addressed. -8-59 

LI  norm  model  fitting  and  deconvolution 

Traditionally,  model  fitting  and  ueconvoiuuon  in  signal  processing  has  used  the  IS  norm.  However,  .t  has  long  been  known  in  the 
statistical  literature  that  LI  regression  is  far  more  robust  with  respect  to  outliers  -  bad  observations  o.  long-taiiea  error  distributions.  For 
this  reason,  Li  norm  model  fitting  is  oegmmng  to  receive  attention  in  the  signal  processing  community  for  deconvolution  and  modei-titting 
methods  of  spectral  analysis.  50-61  on  traditional  computers,  Li  model  fitting  has  usually  been  performed  via  variants  of  the  simplex 
algorithm.  The  simplex  algorithms  requires  a  great  deai  of  testing,  branching,  and  data  movement,  and  does  not  seem  to  be  veil  suited  to 
.moiementauon  on  systolic  arrays.  However.  Li  norm  fitting  may  also  be  performed  via  iteratively  rewetg.nted  '.east  squares  techniques0-"00 
using  extensions  of  current  parallel  algorithms  and  architectures  for  orthogonal  tnangulanzation  and  singular  vaiue  decomposition. 
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